The Curse of Dimensionality in Data Mining and Time Series Prediction

نویسندگان

  • Michel Verleysen
  • Damien François
چکیده

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling and prediction of time-series of monthly copper prices

One of the main tasks to analyze and design a mining system is predicting the behavior exhibited by prices in the future. In this paper, the applications of different prediction methods are evaluated in econometrics and financial management fields, such as ARIMA, TGARCH, and stochastic differential equations, for the time-series of monthly copper prices. Moreover, the performance of these metho...

متن کامل

Fuzzy clustering of time series data: A particle swarm optimization approach

With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...

متن کامل

Time-Series Classification in Many Intrinsic Dimensions

In the context of many data mining tasks, high dimensionality was shown to be able to pose significant problems, commonly referred to as different aspects of the curse of dimensionality. In this paper, we investigate in the time-series domain one aspect of the dimensionality curse called hubness, which refers to the tendency of some instances in a data set to become hubs by being included in un...

متن کامل

Feature Selection for Genomic and Proteomic Data Mining

The extreme dimensionality (also known as the curse of dimensionality) in genomic data has been traditionally a serious concern inmany applications. This hasmotivated a lot of research in feature representation and selection, both aiming at reducing dimensionality of features to facilitate training and prediction of genomic data. In this chapter,N denotes the number of training data samples,M t...

متن کامل

Risk prediction based on a time series case study: Tazareh coal mine

In this work, the time series modeling was used to predict the Tazareh coal mine risks. For this purpose, initially, a monthly analysis of the risk constituents including frequency index and incidence severity index was performed. Next, a monthly time series diagram related to each one of these indices was for a nine year period of time from 2005 to 2013. After extrusion of the trend, seasonali...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005